Linguistic variations and morphosyntactic annotation of Latin classical texts

نویسندگان

  • Céline Poudat
  • Dominique Longrée
چکیده

This paper assesses the performance of three taggers (MBT, TnT and TreeTagger) when used for the morphosyntactic annotation of classical Latin texts. With this aim in view, we selected the training corpora, -as well as the samples used for tests-, from the texts of the LASLA database. The texts were chosen according to their ability to allow testing of the taggers sensitivity to stylistic, diachronic, generic or discursive variations. On the one hand, this research pinpoints the achievements of each tagger according to the various corpora. On the other hand, the paper proves that these taggers can be used as true heuristic instruments and can help to improve significantly the description of the corpus. MOTS-CLÉS : latin classique, style, genre, discours, morphosyntaxe, étiqueteurs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic linguistic annotation of historical language: ToTrTaLe and XIX century Slovene

The paper describes a tool developed to process historical (Slovene) text, which annotates words in a TEI encoded corpus with their modern-day equivalents, morphosyntactic tags and lemmas. Such a tool is useful for developing historical corpora of highly-inflecting languages, enabling full text search in digital libraries of historical texts, for modernising such texts for today's readers and m...

متن کامل

How to Annotate Linguistic Information in FILES and SCAT

We present a suite of applications used for the Italian Treebank which share their linguistic processor and end up finally in higher level annotation tool called “FILES”. The first application “FILES” – Fully Integrated Linguistic Environment for Syntactic and Functional Annotation is a prototype for a fully integrated linguistic environment for syntactic functional annotation of corpora. It ta...

متن کامل

Porting an Ancient Greek and Latin Treebank

We have recently converted a dependency treebank, consisting of ancient Greek and Latin texts, from one annotation scheme to another that was independently designed. This paper makes two observations about this conversion process. First, we show that, despite significant surface differences between the two treebanks, a number of straightforward transformation rules yield a substantial level of ...

متن کامل

Corpus, Medical Text, Annotation Morpho-syntactic Tagging, Natural Language Processing Corpus of Medical Texts and Tools

There is only one large corpus of Polish annotated with morpho-syntactic information, namely The IPI PAN Corpus (IPIC). This situation is a big obstacle in creation of tools for natural language processing dedicated to the domain of medical texts. However, the real life medical texts exhibit features making them very distinct from the most of the texts stored in IPIC. In the paper, the attempts...

متن کامل

Porting Elements of the Austrian Baroque Corpus onto the Linguistic Linked Open Data Format

We describe work on porting linguistic and semantic annotation applied to the Austrian Baroque Corpus (ABaC:us) to a format supporting its publication in the Linked Open Data Framework. This work includes several aspects, like a derived lexicon of old forms used in the texts and their mapping to modern German lemmas, the description of morphosyntactic features and the building of domainspecific...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TAL

دوره 50  شماره 

صفحات  -

تاریخ انتشار 2009